26 research outputs found
Improving Word Association Measures in Repetitive Corpora with Context Similarity Weighting
Peer reviewe
Chapter 26 Language technology approach to “seeing” in Akkadian
One of the ways meanings of words can be understood is based on their distributional properties. Such methodology offers an interesting quantitative viewpoint on the study of the lexicography of long-extinct languages. This chapter explores the use of Pointwise Mutual Information (PMI), a well-known statistical word association measure used in collocation analysis. PMI is applied to the data in order to gain insights on the semantic nuances of Akkadian verbs of seeing (amāru, naṭālu, palāsu, dagālu, ḫiātu, barû, and subbû). To evaluate the data-driven results, the findings are compared to previous philological work by Ainsley Dicks. The analysis of the top-ranked PMI-extracted collocates provides a good overview of the typical semantic differences between the seven verbs of interest
BabyLemmatizer : A Lemmatizer and POS-tagger for Akkadian
We present a hybrid lemmatizer and POS-tagger for Akkadian, the language of the ancient Assyrians and Babylonians, documented from 2350 BCE to 100 CE. In our approach the text is first POS-tagged and lemmatized with TurkuNLP trained with human-verified labels, and then post-corrected with dictionary-based methods to improve the lemmatization quality. The post-correction also assigns labels with confidence scores to flag the most suspicious lemmatizations for manual validation. We demonstrate that the presented tool achieves a Lemma+POS labeling accuracy of 94%, and a lemmatization accuracy of 95% in a held-out test set.Peer reviewe
Digital Approaches to Analyzing and Translating Emotion : What Is Love?
This chapter discusses the use of digital tools – in particular, language technology – to study the history of emotions. There are a growing number of annotated text corpora for ancient languages large enough to benefit from computational analysis. This chapter focuses on the cuneiform Akkadian texts available in the Open Richly Annotated Cuneiform Corpus (Oracc) and applies two language-technological methods, pointwise mutual information (PMI) and the fastText implementation of the continuous skip-gram model, to a dataset of 7,346 texts. To illustrate the potential of these methods, they are used to analyze the semantic domains of the verb râmu, “to love,” and its derivatives in Akkadian. Because the usage and semantic domains of a word can vary greatly between different genres, the dataset is divided into several genres, and the analysis focuses on royal inscriptions, letters, and literary text genres. The results show that, like the word love in English, râmu can denote different aspects of affection and love. It refers, for example, to erotic and sexual relationships between people, affection between family members, the king’s love of justice, and the gods’ pleasure with and acceptance of the king who fulfills divine expectations.Peer reviewe
Semantic Domains in Akkadian Text
The article examines the possibilities offered by language technology for analyzing semantic fields in Akkadian. The corpus of data for our research group is the existing electronic corpora, Open richly annotated cuneiform corpus (ORACC). In addition to more traditional Assyriological methods, the article explores two language technological methods: Pointwise mutual information (PMI) and Word2vec.Peer reviewe
Relatório de estágio em farmácia comunitária
Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr